Clustering Aggregation as Maximum-Weight Independent Set
نویسندگان
چکیده
We formulate clustering aggregation as a special instance of Maximum-Weight Independent Set (MWIS) problem. For a given dataset, an attributed graph is constructed from the union of the input clusterings generated by different underlying clustering algorithms with different parameters. The vertices, which represent the distinct clusters, are weighted by an internal index measuring both cohesion and separation. The edges connect the vertices whose corresponding clusters overlap. Intuitively, an optimal aggregated clustering can be obtained by selecting an optimal subset of non-overlapping clusters partitioning the dataset together. We formalize this intuition as the MWIS problem on the attributed graph, i.e., finding the heaviest subset of mutually non-adjacent vertices. This MWIS problem exhibits a special structure. Since the clusters of each input clustering form a partition of the dataset, the vertices corresponding to each clustering form a maximal independent set (MIS) in the attributed graph. We propose a variant of simulated annealing method that takes advantage of this special structure. Our algorithm starts from each MIS, which is close to a distinct local optimum of the MWIS problem, and utilizes a local search heuristic to explore its neighborhood in order to find the MWIS. Extensive experiments on many challenging datasets show that: 1. our approach to clustering aggregation automatically decides the optimal number of clusters; 2. it does not require any parameter tuning for the underlying clustering algorithms; 3. it can combine the advantages of different underlying clustering algorithms to achieve superior performance; 4. it is robust against moderate or even bad input clusterings.
منابع مشابه
Clustering on k-Edge-Colored Graphs
We study the Max k-colored clustering problem, where, given an edge-colored graph with k colors, we seek to color the vertices of the graph so as to find a clustering of the vertices maximizing the number (or the weight) of matched edges, i.e. the edges having the same color as their extremities. We show that the cardinality problem is NP-hard even for edge-colored bipartite graphs with a chrom...
متن کاملDistributed Approximation of Maximum Independent Set and Maximum Matching
We present a simple distributed ∆-approximation algorithm for maximum weight independent set (MaxIS) in the CONGEST model which completes in O(MIS(G) · logW ) rounds, where ∆ is the maximum degree, MIS(G) is the number of rounds needed to compute a maximal independent set (MIS) on G, and W is the maximum weight of a node. Plugging in the best known algorithm for MIS gives a randomized solution ...
متن کاملEIDA: An Energy-Intrusion aware Data Aggregation Technique for Wireless Sensor Networks
Energy consumption is considered as a critical issue in wireless sensor networks (WSNs). Batteries of sensor nodes have limited power supply which in turn limits services and applications that can be supported by them. An efcient solution to improve energy consumption and even trafc in WSNs is Data Aggregation (DA) that can reduce the number of transmissions. Two main challenges for DA are: (i)...
متن کاملRestart and Random Walk in Local Search for Maximum Vertex Weight Cliques with Evaluations in Clustering Aggregation
The Maximum Vertex Weight Clique (MVWC) problem is NP-hard and also important in realworld applications. In this paper we propose to use the restart and the random walk strategies to improve local search for MVWC. If a solution is revisited in some particular situation, the search will restart. In addition, when the local search has no other options except dropping vertices, it will use random ...
متن کاملA Maximum Lifetime Algorithm for Data Gathering Without Aggregation in Wireless Sensor Networks
Data gathering in wireless sensor network (WSN) has attracted a lot of attention in research. Data gathering can be done with or without aggregation, depending on the degree of correlation among the source data. In this paper, we study the problem of data gathering without aggregation, aiming to conserving the energy of sensor nodes so as to maximize the network lifetime. We model the problem a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012